Search CORE

59 research outputs found

Resilin: Elastic MapReduce over Multiple Clouds

Author: Iordache Anca
Morin Christine
Parlavantzas Nikos
Riteau Pierre
Publication venue: HAL CCSD
Publication date: 01/10/2012
Field of study

The MapReduce programming model, introduced by Google, offers a simple and efficient way of performing distributed computation over large data sets. Although Google's implementation is proprietary, MapReduce can be leveraged by anyone using the free and open-source Apache Hadoop framework. To simplify the usage of Hadoop in the cloud, Amazon Web Services offers Elastic MapReduce, a web service enabling users to run MapReduce jobs. Elastic MapReduce takes care of resource provisioning, Hadoop configuration and performance tuning, data staging, fault tolerance, etc. This service drastically reduces the entry barrier to perform MapReduce computations in the cloud, allowing users to concentrate on the problem to solve. However, Elastic MapReduce is restricted to Amazon EC2 resources, and is provided at an additional cost. In this paper, we present Resilin, a system implementing the Elastic MapReduce API with resources from clouds other than Amazon EC2, such as private and scientific clouds. Furthermore, we explore a feature going beyond the current Amazon Elastic MapReduce offering: performing MapReduce computations over multiple distributed clouds. The evaluation of Resilin shows the benefits of running computations on more than one cloud. While not being the most efficient way to perform Hadoop computations, it solves the problem of resource availability and adds more flexibility regarding the type/price of resource.Le modèle de programmation MapReduce, introduit par Google, offre un moyen simple et efficace de réaliser des calculs distribués sur de grandes quantités de données. Bien que la mise en oeuvre de Google soit propriétaire, MapReduce peut être utilisé librement avec l'environnement Hadoop. Pour simplifier l'utilisation de Hadoop dans les nuages informatiques, Amazon Web Services offre Elastic MapReduce, un service web qui permet aux utilisateurs d'exécuter des applications MapReduce. Il prend en charge l'allocation de ressources, la configuration et l'optimisation de Hadoop, la copie des données, la tolérance aux fautes, etc. Ce service facilite l'exécution d'applications MapReduce dans les nuages informatiques, permettant ainsi aux utilisateurs de se concentrer sur la résolution de leur problème plutôt que sur la gestion de la plate-forme d'exécution. Elastic MapReduce est limité á l'utilisation de ressources fournies par Amazon EC2 et est proposé à un coût additionnel. Dans cet article, nous présentons Resilin, un système mettant en oeuvre l'API Elastic MapReduce avec des ressources provenant d'autres nuages informatiques que Amazon EC2, tels que les nuages privés ou communautaires. De plus, nous explorons une fonctionnalité nouvelle par rapport au service offert par Amazon Elastic MapReduce: l'exécution d'applications MapReduce sur plusieurs nuages géographiquement distribués. L'évaluation de Resilin montre les avantages liés à l'utilisation de plus d'un nuage pour l'exécution d'applications MapReduce. Bien qu'il ne fournisse pas la solution la plus efficace pour l'exécution d'applications MapReduce, Resilin résout le problème de la disponibilité des ressources et ajoute une plus grande flexibilité en ce qui concerne le type et le prix des ressources

HAL-CentraleSupelec

INRIA a CCSD electronic archive server

HAL-Rennes 1

Resilin: Elastic MapReduce for Private and Community Clouds

Author: Iordache Ancuta
Morin Christine
Riteau Pierre
Publication venue: HAL CCSD
Publication date: 01/10/2011
Field of study

The MapReduce programming model, introduced by Google, offers a simple and efficient way of performing distributed computation over large data sets. Although Google's implementation is proprietary, MapReduce can be leveraged by anyone using the free and open source Apache Hadoop framework. To simplify the usage of Hadoop in the cloud, Amazon Web Services offers Elastic MapReduce, a web service enabling users to run MapReduce jobs. Elastic MapReduce takes care of resource provisioning, Hadoop configuration and performance tuning, data staging, fault tolerance, etc. This service drastically reduces the entry barrier to perform MapReduce computations in the cloud, allowing users to concentrate on the problem to solve. However, Elastic MapReduce is restricted to Amazon EC2 resources, and is provided at an additional cost. In this paper, we present Resilin, a system implementing the Elastic MapReduce API with resources from other clouds than Amazon EC2, such as private and community clouds. Furthermore, we explore a feature going beyond the current Amazon Elastic MapReduce offering: performing MapReduce computations over multiple distributed clouds.Le modèle de programmation MapReduce, introduit par Google, offre un moyen simple et efficace de réaliser des calculs distribués sur de large quantités de données. Bien que la mise en œuvre de Google soit propriétaire, MapReduce peut être utilisé librement en utilisant le framework Hadoop. Pour simplifier l'utilisation de Hadoop dans les nuages informatiques, Amazon Web Services offre Elastic MapReduce, un service web qui permet aux utilisateurs d'exécuter des travaux MapReduce. Il prend en charge l'allocation de ressources, la configuration et l'optimisation de Hadoop, la copie des données, la tolérance aux fautes, etc. Ce service rend plus accessible l'exécution de calculs MapReduce dans les nuages informatiques, permettant aux utilisateurs de se concentrer sur la résolution de leur problème plutôt que sur la gestion de leur plate-forme. Cependant, Elastic MapReduce est limité à l'utilisation de ressources de Amazon EC2, et est proposé à un coût additionnel. Dans cet article, nous présentons Resilin, un système mettant en œuvre l'API Elastic MapReduce avec des ressources provenant d'autres nuages informatiques que Amazon EC2, tels que les nuages privés ou communautaires. De plus, nous explorons une fonctionnalité additionnelle comparé à Amazon Elastic MapReduce: l'exécution de calculs MapReduce sur plusieurs nuages distribués

INRIA a CCSD electronic archive server

Hydroinformatics On The Cloud: Data Integration, Modeling And Information Communication For Flood Risk Management

Author: Armstrong Patrick
Demir Ibrahim
Goska Radoslaw
Keahey Kate
Mantilla Ricardo
Riteau Pierre
Seo Bongchul
Small Scott
Publication venue: CUNY Academic Works
Publication date: 01/08/2014
Field of study

The Iowa Flood Information System (IFIS) is a web-based platform developed by the Iowa Flood Center (IFC) to provide access to flood inundation maps, real-time flood conditions, flood warnings and forecasts, flood-related data, information and interactive visualizations for communities in Iowa. The key elements of the IFIS are: (1) flood inundation maps, (2) autonomous “bridge sensors” that monitor water level in streams and rivers in real time, and (3) real-time flood forecasting models capable of providing flood warning to over 1000 communities in Iowa. The IFIS represents a hybrid of file and compute servers, including a High Performance Computing cluster, codes in different languages, data streams and web services, databases, scripts and visualizations. The IFIS processes raw data (50GB/day) from NEXRAD radars, creates rainfall maps (3GB/day) every 5 minutes, and integrates real-time data from over 600 sensors in Iowa. Even though the IFIS serves over 75,000 users in Iowa using local infrastructure, cloud computing can improve scalability, speed, cost efficiency, accessibility, security, resiliency and uptime. In this collaborative study between the Iowa Flood Center and the Nimbus team at the Argonne National Laboratory, we have analyzed feasibility and price/performance measures of moving the MPI-based computations to the cloud as well as assessment of response times from our interactive web-based system. Moving the system to the cloud, and making it independent and portable, would enable us to share our model easily with the flood research community. This presentation provides an overview of the tools and interfaces in the IFIS, and transition of the IFIS from a local infrastructure to cloud computing environment

City University of New York

Evaluating Streaming Strategies for Event Processing across Infrastructure Clouds

Author: Gabriel Antoniu
Kate Keahey
Pierre Riteau
Radu Tudoran
Sergey Panitkin
Publication venue
Publication date: 23/04/2020
Field of study

Abstract-Infrastructure clouds revolutionized the way in which we approach resource procurement by providing an easy way to lease compute and storage resources on short notice, for a short amount of time, and on a pay-as-you-go basis. This new opportunity, however, introduces new performance trade-offs. Making the right choices in leveraging different types of storage available in the cloud is particularly important for applications that depend on managing large amounts of data within and across clouds. An increasing number of such applications conform to a pattern in which data processing relies on streaming the data to a compute platform where a set of similar operations is repeatedly applied to independent chunks of data. This pattern is evident in virtual observatories such as the Ocean Observatory Initiative, in cases when new data is evaluated against existing features in geospatial computations or when experimental data is processed as a series of time events. In this paper, we propose two strategies for efficiently implementing such streaming in the cloud and evaluate them in the context of an ATLAS application processing experimental data. Our results show that choosing the right cloud configuration can improve overall application performance by as much as three times

CiteSeerX

Evaluating Streaming Strategies for Event Processing across Infrastructure Clouds

Author: Antoniu Gabriel
Keahey Kate
Panitkin Sergey
Riteau Pierre
Tudoran Radu
Publication venue: IEEE
Publication date: 26/05/2014
Field of study

International audienceInfrastructure clouds revolutionized the way in which we approach resource procurement by providing an easy way to lease compute and storage resources on short notice, for a short amount of time, and on a pay-as-you-go basis. This new opportunity, however, introduces new performance trade-offs. Making the right choices in leveraging different types of storage available in the cloud is particularly important for applications that depend on managing large amounts of data within and across clouds. An increasing number of such applications conformto a pattern in which data processing relies on streaming the data to a compute platform where a set of similar operations is repeatedly applied to independent chunks of data. This pattern is evident in virtual observatories such as the Ocean Observatory Initiative, in cases when new data is evaluated against existing features in geospatial computations or when experimental data is processed as a series of time events. In this paper, we propose two strategies for efficiently implementing such streaming in the cloud and evaluate them in the contextof an ATLAS application processing experimental data. Our results show that choosing the right cloud configuration can improve overall application performance by as much as three times

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Plates-formes d'exécution dynamiques sur des fédérations de nuages informatiques

Author: Riteau Pierre
Publication venue: HAL CCSD
Publication date: 02/12/2011
Field of study

The increasing needs for computing power have led to parallel and distributed computing, which harness the power of large computing infrastructures in a concurrent manner. Recently, virtualization technologies have increased in popularity, thanks to hypervisors improvements, the shift to multi-core architectures, and the spread of Internet services. This has led to the emergence of cloud computing, a paradigm offering computing resources in an elastic, on-demand approach while charging only for consumed resources. In this context, this thesis proposes four contributions to leverage the power of multiple clouds. They follow two directions: the creation of elastic execution platforms on top of federated clouds, and inter-cloud live migration for using them in a dynamic manner. We propose mechanisms to efficiently build elastic execution platforms on top of multiple clouds using the sky computing federation approach. Resilin is a system for creating and managing MapReduce execution platforms on top of federated clouds, allowing to easily execute MapReduce computations without interacting with low level cloud interfaces. We propose mechanisms to reconfigure virtual network infrastructures in the presence of inter-cloud live migration, implemented in the ViNe virtual network from University of Florida. Finally, Shrinker is a live migration protocol improving the migration of virtual clusters over wide area networks by eliminating duplicated data between virtual machines.Les besoins croissants en ressources de calcul ont mené au parallélisme et au calcul distribué, qui exploitent des infrastructures de calcul large échelle de manière concurrente. Récemment, les technologies de virtualisation sont devenues plus populaires, grâce à l'amélioration des hyperviseurs, le passage vers des architectures multi-cœur, et la diffusion des services Internet. Cela a donné lieu à l'émergence de l'informatique en nuage, un paradigme qui offre des ressources de calcul de façon élastique et à la demande, en facturant uniquement les ressources consommées. Dans ce contexte, cette thèse propose quatre contributions pour tirer parti des capacités de multiples nuages informatiques. Elles suivent deux directions : la création de plates-formes d'exécution élastiques au-dessus de nuages fédérés, et la migration à chaud entre nuages pour les utiliser de façon dynamique. Nous proposons des mécanismes pour construire de façon efficace des plates-formes d'exécution élastiques au-dessus de plusieurs nuages utilisant l'approche de fédération sky computing. Resilin est un système pour créer et gérer des plates-formes d'exécution MapReduce au-dessus de nuages fédérés, permettant de facilement exécuter des calculs MapReduce sans interagir avec les interfaces bas niveau des nuages. Nous proposons des mécanismes pour reconfigurer des infrastructures réseau virtuelles lors de migrations à chaud entre nuages, mis en œuvre dans le réseau virtuel ViNe de l'Université de Floride. Enfin, Shrinker est un protocole de migration à chaud améliorant la migration de grappes de calcul virtuelles dans les réseaux étendus en éliminant les données dupliquées entre machines virtuelles

HAL-CentraleSupelec

Thèses en Ligne

INRIA a CCSD electronic archive server

HAL-Rennes 1

Building Dynamic Computing Infrastructures over Distributed Clouds

Author: Riteau Pierre
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 16/05/2011
Field of study

International audienceThe emergence of cloud computing infrastructures brings new ways to build and manage computing systems, with the flexibility offered by virtualization technologies. In this context, this PhD thesis focuses on two principal objectives. First, leveraging virtualization and cloud computing infrastructures to build distributed large scale computing platforms from multiple cloud providers, allowing to run software requiring large amounts of computation power. Second, developing mechanisms to make these infrastructures more dynamic. These mechanisms, providing inter-cloud live migration, offer new ways to exploit the inherent dynamic nature of distributed clouds

HAL-CentraleSupelec

Crossref

INRIA a CCSD electronic archive server

HAL-Rennes 1

Plates-formes d'exécution dynamiques sur des fédérations de nuages informatiques

Author: PRIOL Thierry
RITEAU Pierre
Publication venue
Publication date: 01/01/2011
Field of study

OpenGrey Repository